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Abstract. A new generation of observational science instruments is dramat- 
ically increasing collected data volumes in a range of fields. These instru- 
ments include the Square Kilometre Array (SKA), Large Synoptic Survey Tele- 
scope (LSST), terrestrial sensor networks, and NASA satellites participating in 
"decadal survey" missions. Their unprecedented coverage and sensitivity will 
likely reveal wholly new categories of unexpected and transient events. Com- 
mensal methods passively analyze these data streams, recognizing anomalous 
events of scientific interest and reacting in real time. We report on a case ex- 
ample: V-FASTR, an ongoing commensal experiment at the Very Long Base- 
line Array (VLBA) that uses online adaptive pattern recognition to search 
for anomalous fast radio transients. V-FASTR triages a millisecond-resolution 
stream of data and promotes candidate anomalies for further offline analy- 
sis. It tunes detection parameters in real time, injecting synthetic events to 
continually retrain itself for optimum performance. This self-tuning approach 
retains sensitivity to weak signals while adapting to changing instrument con- 
figurations and noise conditions. The system has operated since July 2011, 
making it the longest-running real time commensal radio transient experiment 
to date. 

Keywords: Radio Astronomy, Pattern Recognition, Real Time Machine Learn- 
ing, Time Series Analysis, Fast Radio Transients 



1. Introduction 

The next generation of scientific instruments will dramatically increase collected 
data volumes in multiple disciplines. Astronomers will soon face petascale data 
streams from instruments like the Large Synoptic Survey Telescope and Square 
Kilometer Array [1]. The environmental remote sensing community will see orders 
of magnitude increases in data volume from "decadal survey" satellite missions such 
as OCO-2, HyspIRI and GeoCAPE. These instruments will almost certainly observe 
secondary transient events and anomalies of scientific interest alongside their pri- 
mary missions. Examples of transient activities include orbiters that serendipitously 
observe wildfires or erupting volcanoes and ground-based telescopes that observe 
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supernovae and other momentary astronomical events. These generaUy require im- 
mediate action, either to trigger additional observations or to archive contextual 
data needed for validation. The time window for foUowup action is typically quite 
short, demanding efficient real time detection algorithms. 

This work describes a case study investigating fast radio transients. These 
millisecond-scale pulses of radio frequency energy may be produced by astronomical 
events such as Gamma Ray Bursts, intermittent radio pulsars known as RRATs, 
or possibly more extreme events such as merging black holes [2, 6 . At the time 
of this writing, few candidate transients have been discovered in past archives [S], 
and there have been no wholly conclusive detections of the most exotic events. 
More examples are needed to validate these detections and characterize the pop- 
ulation. To this end, passive commensal analyses, which operate in piggyback on 
other investigations, are an appealing way to accumulate significant telescope time. 

Radio transient detection typifies common challenges of large-scale commensal 
searches. One is consistency. Commensal analysis must cope with a range of 
different interference conditions and configurations. It must also keep pace with 
any instrument upgrades and modifications. Another challenge is data volume. 
The system operates continuously with high duty cycle and processing loads. This 
requires an automated system that can triage the data stream, promoting only the 
most important segments to storage and analyst attention. 

Here we describe the Very Long Baseline Array Fast Transients Experiment (V- 
FASTR) , a system for commensal detection of fast radio transient events [121 HI] ■ 
V-FASTR is a pathfinder project for a larger transient detection system known as 
CRAFT. It searches data from the Very Long Baseline Array (VLBA) for transient 
events and anomalies. Previous work analyzed event detection algorithms using 
simulation and controlled tests prior to deployment |llj . Here we report on the 
actual performance during the first year of operations. We also describe a novel new 
approach for self-tuning of detection parameters by injecting synthetic events into 
the data stream. This adapts to a range of instrument conditions, detecting rare, 
subtle astronomical events while retaining resilience to noise. As of this writing, 
V-FASTR is the single longest running commensal transient search to date and an 
instructive example of real time online event detection. 

2. The V-FASTR System 

The Very Long Baseline Array (VLBA) consists of ten identical antennas at 
locations across the western hemisphere [9]. Typical VLBA operations point all 
of the telescopes at a common location to perform interferometric imaging with 
very high spatial resolution. Each antenna records raw voltage samples that are 
sent in physical disks to the array operations center in Socorro, NM. Here they are 
time-aligned, correlated, and processed to form an interferometric radio sky image. 
After this step the disks are erased for reuse. 

V-FASTR must identify any transients in the short interval before the disks are 
erased. The original raw voltage data is important for validation because it retains 
the phase information of the signal, allowing for vital post-analyses such as imaging 
(finding the physical position of a pulse on the sky), coherent summing (combin- 
ing phased signals to improve sensitivity), and coherent dedispersion (removing 
dispersive smoothing in the voltage domain). Unfortunately, voltage data is so vo- 
luminous that it is impossible to archive more than a small fraction. The V-FASTR 
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Figure 1 . V-FASTR design and interfaces with the VLBA Corre- 
lator. The dashed boxes are the adaptive self-tuning system, which 
runs in alongside the main detection pipeline. Image Credit: 
NRAO/IAU 

commensal detection system [121 solves this by searching for events in real time and 
saving only the most promising data. It acts as a plugin to the DiFX software cor- 
relator [3], which we have modified to broadcast antenna autocorrelations over a 
secondary network. In this mode the correlator serves as a spectrometer measuring 
the channelized, squared, and accumulated power at each antenna across 32 fre- 
quency channels. The integration time interval is approximately 1 ms. V-FASTR 
searches this data stream and notifies the correlator of any detected events. At the 
end of the correlation process the system saves a copy of these candidate segments 
to long-term storage for more thorough analysis. Figure [T] illustrates the V-FASTR 
system design. The commensal analysis runs on a rack-mounted computer with six 
modern processors but typically uses only a fraction of these resources. We describe 
each software component in greater detail below. 

2.1. Dedispersion search. The first stage of the V-FASTR analysis is a dedisper- 
sion search, effectively a matched filter search based on knowledge of how signals 
propagate through the interstellar medium. It corrects for the distortion caused 
by free electrons in the space between the source and the receiver. Following [7, , a 
broadband EM pulse experiences a frequency-dependent delay: 

(1) t2 - = 4.15ms DM[(i/i/GHz)-2) - (zya/GHz)-^)] 

The delay in observed signal arrival between two frequency channels i^i and z/2 is 
dependent on the bandwidth, the center frequency, and the Dispersion Measure 
(DM) of the source. This last value is related to distance and the density of the 
interstellar medium. The DM is not known for a new source. Consequently, the 
V-FASTR system performs a dedispersion search over many candidate DMs [T] . For 
each timestep it tests a range of possible dispersion measures, equivalent to applying 
a bank of matched filters with appropriate frequency /time profiles. Dedispersion 
searching has proven effective for the detection of pulsars and other astronomical 
phenomena. 

A dedispersion search can also help separate genuine astronomical signals from 
Radio Frequency Interference (RFI). Local interference typically manifests as a 
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broadband signal with no dispersion, since the pulse originates locally and all fre- 
quencies arrive simultaneously. Figure [2] (Left) is typical of the input data arriving 
to the V-FASTR system from the correlator. This dynamic spectrum image displays 
the data as a matrix of signal powers channelized into discrete time and frequency 
bins. The pixel intensity shows observed power, the accumulated squared voltage 
measured by a station. This segment contains several kinds of local interference such 
as narrow-band RFI limited to a few frequency channels and a bright momentary 
broadband signal near 0.14 seconds. The panel at right shows a real astrophysi- 
cal event: a single pulse from the pulsar J1919+0021, which has an actual DM of 
90.315. Note that the dispersion bends the pulse slightly; the lower frequency com- 
ponents experience greater dispersion delay. Searching over many DMs transforms 
the initial spectral data into a vector of approximately 100 matched filter scores - 
one per DM searched - for each of 10 antennas. 




Figure 2. Left: Examples of radio interference, including chan- 
nelized, momentary, and band-level changes. Right: A single pulse 
from pulsar J1919+0021, with a classic dispersed pulse profile. 
This plot uses an incoherent sum across all VLBA stations to im- 
prove contrast; in reality the dedispersion search analyzes each sta- 
tion's data stream separately. The times shown here are consistent 
with the dedispersed time series shown later in Figure [3] 

2.2. Robust detection of transients. The VLBA was designed to have low sys- 
tem temperatures and good gain stability over long timescales. It is conventionally 
used in a low time resolution regime. In contrast, V-FASTR measures signals 
at high time resolution and experiences additional interference due to momentary 
RFI, background signal discontinuities, varying gains, and other artifacts. We must 
account for these effects in order to limit the number of false detections. 

We exploit the fact that the stations are geographically separated, so interference 
only appears at a single station while true astrophysical signals are correlated across 
many stations [TT]. The top nine rows of Figure [s] show the signal from each antenna 
after dedispersion to a DM of 90. An arrow shows a real astronomical pulse from 
the pulsar J1919-I-0021. The pulse appears in multiple stations simultaneously, 
indicating that it is isolated on the celestial sphere. Because we do not know the 
correct DM in advance, many tens of these time series are produced simultaneously 
and the system must analyze them all. This provides a real- valued input vector 
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having dimensionality of order one hundred. The task amounts to a multivariate 
pattern recognition problem with millisecond time requirements, a nonstationary 
background distribution, and a very low tolerance for false positives (10~^ false 
positives per timestep). 
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Figure 3. Detection of a single pulse from pulsar J1919+0021. 
The top 9 panels show dedispersed time series from different VLBA 
antennas. Non-astrophysical artifacts and interference events ap- 
pear at several stations. Such differences are often caused by RFI 
but could also be caused by maintenance issues, weather, and the 
source position above each station's horizon. The bottom panel 
shows the scalar output of the adaptive detection score. An arrow 
indicates the event - the only apparent pulse from this source. It 
was missed by the standard detection approach but successfully 
found by the V-FASTR system. 

We represent each sample in the dedispersed time series as a multi-element vector 
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where d is the number of dispersion measures and n the number of 
stations. We can formalize the detection rule as a function /fe(x) that maps the 
dedispersed time series to a scalar score which is high for true transients but low 
for interference. We anticipate that any astrophysical fast transients will be very 
rare. For instance, [HI HH] estimate the rate in a local 1 Gpc'^ volume to be one per 
square degree of sky per 1000 hours. For the VLBA primary beam field of view 

5 



of 0.27 deg'^ and 1 nis temporal quantization this amounts to a target/background 
fraction of approxmately 1 x 10~^° [T51IT^ . 

The conventional approach to signal detection in VLBA data is to sum the signals 
from all antennas. As described in previous work [llj, the V-FASTR system favors 
a simple robust estimator that excises one or more extreme values and averages the 
rest. First the data stream from each station is noise-whitened, standardized to 
have zero mean and unit standard deviation, dcdispersed, and finally filtered using 
a high-pass filter with a 100 ms diameter. We then apply a separate matched filter 
for each DM and store the result as a multivariate time series. Suppose Xj^ is the 
matched filter result from station i at dispersion measure d. We sort the antennas 
in order of increasing signal strength for each DM and sum all but the most extreme 
k station signals to make a combined signal. The maximum response over all DMs 
becomes the overall score for that timestep: 

^ n—k 

(2) Detect if /fc(x) > t for /fe(x) = sup — 

d [n — K) .^^ 

This is most appropriate if the stations provide independent and identically dis- 
tributed estimates of some common source perturbed by occasional sporadic noise. 
Operationally we define r in terms of a Signal to Noise Ratio (SNR) given in stan- 
dard deviations from the mean. 

The best value of k depends on noise statistics. V-FASTR uses incoherent addi- 
tion that ignores phase information, so the measured SNR for a real event approx- 
imately follows the relationship \/n — k and excising too many antennas reduces 
sensitivity. However, excising too few extreme values increases the false positive 
rate. The optimal tradeoff depends on background noise and the number of anten- 
nas participating in the observation. 

We formalize the expected loss for a particular value of k using Bayesian decision 
theory. Each timestep is associated with a background event yo or a transient event 
2/1, with transient events much rarer than the background. We posit a misclassifica- 
tion cost Co for a false positive detection, and Ci for missing a real transient event. 
We optimize k and r to minimize Bayes' loss, i.e. the expected loss considering 
both error probabilities. 

(3) L(r, k) = ci p(/fc(x) < T |yi) p{yi) + cq p{fk{x) > t \yo) p{yo) 

(4) = a p(/fc(x) < T |yi) -t-p(/fe(x) > r {ya) 

The a term is a constant that incorporates the prior probability of each event as 
well as the relative costs of misclassification. Even positing a very large weight for 
ci, the class imbalance is much larger and we can assume a ^ 1. 

2.3. Adaptive self-tuning for RFI excision. It is challenging to construct a 
universal detection rule because Equation [4] depends on noise properties that are 
unknown in advance and difficult to model. Instead we use the data stream itself iov 
an empirical nonparametric estimate of the loss function L(t, k). Specifically, we use 
Equation [T] to construct synthetic pulses at known locations and dispersions, add 
these signals to the input stream and then attempt to recover them using a range 
of parameterizations. We set r to be the minimum value that does not produce 
any significant false positives (e.g., very low-a regimes). We use the number of 
synthetic pulses recovered prior to the first m false positives, for low m, as a figure 
of merit to select k. 
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During operations the commensal system buffers 10,000 timesteps of spectrom- 
eter data from tlie correlator and retrains itself on the preceding batch prior to 
detection. It injects 200 dispersed pulses at SNRs from 5 to 9 and DMs from 10 
to 50. This range does not span the full dedispersion search, but it allows the 
system to fit more pulses into the same time interval. This parallel stream is then 
dedispersed and analyzed with all possible excision levels. We compute the detec- 
tion figure of merit permitting a small number m of real positives (typically 20) 
so that single, isolated real pulses do not significantly affect the result. Typically 
the system excises a single antenna but can remove up to four in scans having very 
high interference. In Figure [3] the system elected to excise two antennas, resulting 
in the combined detection score shown in the bottom panel. 

3. Performance 

This section evaluates the adaptive system's performance. The VLBA occa- 
sionally observes known pulsars during its normal operations, which provides a 
blind sensitivity test. To date, the commensal system has detected hundreds of 
events from seven different pulsars (Table [T]). In contrast, the conventional non- 
adaptive approach of summing signals has failed to detect the two weakest pulsars 
(J1919-f-0021 and J0146+5922). Figure |3] shows the J1919-f0021 detection. 



Pulsar name 


Max SNR 


DM 


On-pulse flux density (Jy) 


J0332+5434 


>50 


26 


14.0 


J1136-hl551 


20 


5 


1.0 


J0826-f2637 


19 


20 


0.5 


J1935-hl616 


12 


158 


1.0 


J01645-0317 


10 


36 


0.5 


J0157-f6212 


10 


30 


0.14 


J1919+0021 


9 


90 


0.05 


J0147-f5922 


9 


40 


0.13 


J0953-h0755 


8 


3 


2.0 



Table 1. Known pulsars detected serendipitously by V-FASTR. 



We also compared detection results for observations containing many pulses. 
Figure |4] shows a Receiver Operating Characteristic (ROC) curve for a scan con- 
taining over 20 pulses from J0826-f 2637. Not all of the curve is relevant to our task 
since the correlator can only archive a small number of events in the time interval 
between detection and disk erasure. Typically it saves at most four or five events 
with a priority ordering determined by their signal strength. Consequently the set- 
tings with fewer than five false positives, shaded in gray, are most important. The 
adaptive approach finds a far greater number of real astrophysical events meaning 
it successfully prioritizes weak real pulses over RFI. We have tuned the adaptive 
system for maximum precision in the critical low-false-positive regime. 

A major bottleneck to sustained commensal analyses is the limited analyst time 
available to review any candidate detections. The V-FASTR policy is to manually 
review all detections with a signal-to-noise ratio above 7. Post-hoc analyses of V- 
FASTR performance show that the adaptive system has greatly reduced the overall 
number of these candidates that must be reviewed. April 2012 provided a good 

7 



25 













1 : 
















































J 












Relevant : 


Sum 




performanpe 
regime : 


Adaptive 



5 10 15 20 25 

False Positives Incurred 



Figure 4. Detection performance for pulsar J0826+2637 by adap- 
tive and non-adaptive (summing) approaches. 



test case since there were no pulsars or other known astrophysical transient events 
being observed. We applied a standard RFI filter to remove all signals with a DM 
less than 10. Any remaining detections in April are false positives. Figure [5] shows 
the number of detections at different SNR thresholds, for both adaptive and non- 
adaptive approaches. The non-adaptive approach yields an unfeasibly large number 
of detections for the same SNR threshold. 




Figure 5. The total count of false positive detections for the 
month of April 2012. The vertical axis shows the number of de- 
tections while the horizontal axes show SNR and DM values. Left: 
False positives detected by the standard summing approach. Right: 
False positive detected by the V-FASTR adaptive method. 

Finally, we recorded the excision levels actually selected by the self tuning system 
during this period. Several very clean observations at high frequencies resulted in 
the system choosing fc = 0, tantamount to no RFI protection. These scans did not 
produce any false positive events. For detected events, the proportions of excision 
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levels in use were 54.7% (k==l), 23.9% (k=2), 10.6% (k=3), and 10.7% (k=4). 
Over 78% of the detections used an excision level of just one or two stations. The 
larger excision levels appear only during extremely noisy observations or those 
containing pulsed signals from satellites. Satellite signals can appear in multiple 
stations simultaneously, and are currently the largest source of false positives. 

4. Discussion, Conclusions, and Lessons Learned 

V-FASTR performs adaptive detection of fast transients by commensal analy- 
sis at the Very Long Baseline Array. Its ability to retrain automatically through 
injection of synthetic events has improved reliability over traditional detection ap- 
proaches. This system has operated for over one year and has recently placed new 
limits on the rates of astrophysical phenomena observed at 1.4 GHz and above [13] . 
This makes it a working example of adaptive pattern recognition in data streams 
to enable new science results. V-FASTR continues to accumulate telescope time 
at multiple frequencies; manageable false positive rates mean that the system is 
sustainable for the long term. 

The effective composition and coordination of the team has been crucial factor 
in this successful deployment. The V-FASTR group combines experts in machine 
learning with experts in radio astronomy and the VLBA's core processing system. 
This fusion of expertise kept the machine learning approaches grounded in the 
application and operational perspectives. As a result, the VLBA has adopted V- 
FASTR for full-time use, establishing the policy that all observing campaigns have 
their data analyzed by the commensal system by default. Naturally, any interest- 
ing detections made by V-FASTR would immediately result in the involvement of 
that campaign's Principal Investigator. The provision of immediate data access to 
the automated system is a notable policy innovation that paves the way for more 
commensal automated science concepts. Other instruments and telescopes have 
expressed interest in similar systems to aid in their own processing pipelines. 
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